7. Conclusion and Future Work Novel Objective Function for Improved Phoneme Recognition Using Time-delay Neural Networks. 5. Simulation Results
نویسنده
چکیده
recognize 10 isolated letters and used artificial markers on the lips. No visual feature extraction was integrated into their model. Also of interest are some psychological studies about human speechreading and their approach to describe the human performance. This measurements could also be applied to the performance analysis of automated speechreading systems. Dodd and Campbell [3], and Demorest and Bernstein [2] did some valuable work in this area. We have shown how a state-of-the-art speech recognition system can be improved by considering additional visual information for the recognition process. This is true for optimal recording conditions but even more for non-optimal recording conditions as they usually exist in real world applications. Experiments were performed on the connected letter recognition task, but similar results can be expected for continuous speech recognition as well. Work is in progress to integrate not only the time independent weight sharing but also position independent weight sharing for the visual TDNN, in order to locate and track the lips. We are also on the way to largely increase our database in order to achieve better recognition rates and to train speaker independently. Investigations of different approaches are still in progress in order to combine visual and acoustic features and to apply different prepro-cessing to the visual data. ACKNOWLEDGEMENTS We appreciate the help from the DEC on campus research center (CEC) for the initial data acquisition. classify /b/ and /p/ based only on visual information would lead to recognition rates not better than guessing, or the net perhaps would get sensitive for features which are uncorelated to the produced speech. This leads to the design of a smaller set of visual distinguishable units in speech, so called " visemes ". We investigate a new set of 42 visemes and a 1-ton mapping from the viseme set to the phoneme set. The mapping is necessary for the combined layer, in order to calculate the combined acoustic and visual hyphotheses for the DTW layer. For example the hypotheses for /b/ and /p/ are built out of the same viseme /b_or_p/ but the different phonemes /b/ and /p/ respectly. Our database consists of 114 and 350 letter sequences spelled by two male speakers. They consist of names and random sequences. The first data set was split into 75 training and 39 test sequences (speaker msm). The second data set was split into 200 training and 150 test sequences (speaker mcb). …
منابع مشابه
Novel Objective Function for Improved Phoneme Recognition Using Time-delay Neural Networks. Vii. Conclusion and Future Work Iv. Phoneme and Viseme Coding
In this paper we show how recognition perfor-mance in automated speech perception can be significantlyimproved by additional Lipreading, so called “speech-read-ing”. We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. Theacoustic and visual speech data is preclassified in two sepa-rate front-end phoneme TDNNs and com...
متن کاملDistribution Systems Reconfiguration Using Pattern Recognizer Neural Networks
A novel intelligent neural optimizer with two objective functions is designed for electrical distribution systems. The presented method is faster than alternative optimization methods and is comparable with the most powerful and precise ones. This optimizer is much smaller than similar neural systems. In this work, two intelligent estimators are designed, a load flow program is coded, and a spe...
متن کاملContinuous Speech Phoneme Recognition Using Dynamic Artificial Neural Networks
Phoneme classification and recognition is the first step to large vocabulary continuous speech recognition. This step represents the acoustic modeling part of such a system. In hybrid speech recognition systems phoneme recognition is made by artificial neural networks (ANN’s). The main objective of this paper is the investigation of dynamic ANN’s, namely the Time-Delay Neural Networks (TDNN) an...
متن کاملNew variant of the Self Organizing Map in Pulsed Neural Networks to Improve Phoneme Recognition in Continuous Speech
Speech recognition has gradually improved over the years, phoneme recognition in particular. Phoneme recognition plays very important role in speech processing. Phoneme strings are basic representation for automatic language recognition and it is proved that language recognition results are highly correlated with phoneme recognition results. Nowadays, many recognizers are based on Artificial ne...
متن کاملAN IMPROVED CONTROLLED CHAOTIC NEURAL NETWORK FOR PATTERN RECOGNITION
A sigmoid function is necessary for creation a chaotic neural network (CNN). In this paper, a new function for CNN is proposed that it can increase the speed of convergence. In the proposed method, we use a novel signal for controlling chaos. Both the theory analysis and computer simulation results show that the performance of CNN can be improved remarkably by using our method. By means of this...
متن کامل